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Abstract 

A new nonparametric estimator of a convex regression function in any 
dimension is proposed and its convergence properties are studied. We start 
by using any estimator of the regression function and we convexify it by 
taking the convex envelope of a sample of the approximation obtained. We 
prove that the uniform rate of convergence of the estimator is maintained 
after the convexification is applied. The finite sample properties of the 
new estimator are investigated by means of a simulation study and the 
application of the new method is demonstrated in examples. 

Keywords: approximation, convex regression, convexity, data-smoothing, non- 
parametric regression 

1 Introduction 

In the nonparametric regression model 

Y n = f(X n )+e n , 71 = 1,2,..., (1) 

where Y n 6 R, X n 6 M. d and e n is an error term, it is not uncommon to have 
strong presumptions on properties of / — such as monotonicity, convexity or 
concavity — which should be taken into account. 

Typical examples appear in economics (indirect utility, production or cost 
functions), medicine (dosage-response experiments) and biology (growth curves). 

A much studied case is the instance of a monotone regression function for 
d = 1, estimated by using least squares (see, e.g., Brunk, 1955; Mukerjee, 1988, 
and Barlow ct al., 1972 or Robertson et al., 1988 for a summary of this work). For 
convex (concave) regression Hildrcth (1954) proposed to use convex least square 
estimates, and Hanson & Pledger (1976) proved their consistency. Algorithms for 
computing these estimates were developed by Wu (1982) and Fraser & Massam 
(1989), and the rate of convergence was derived by Mammen (1991). Later 
Groeneboom et al. (2001) derived the asymptotic distribution of the estimator 
at a fixed point of positive curvature. In all of these works the estimates hold 
pointwise. 

Still in one dimension, one can avoid the complications of least squares 
techniques and use more conventional smoothing methods when / is convex (or 
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concave), as shown by Birke & Dette (2007). Using the fact that a differentiable 
function is convex (concave) if the derivative is increasing (decreasing) , they 
propose to first smooth the data using any constrained nonparametric estimate 
(kernel type, local polynomial, series or spline estimator), then compute the 
derivative of the smooth function thus obtained, which is isotonized and finally 
integrated to recover a convex estimation. As mentioned above, the isotonization 
of a function is something that has already been mastered in the non-parametric 
literature, and using those results the rates of convergence obtained by them are 
the usual in non-parametric regression. 

Unfortunately this technique can only be used in one dimension and with 
smooth convex functions and cannot be extended to higher dimensions, since 
there is no such simple characterization of convexity in M. d for d > 1 . 

As far as we know, little has been done in higher dimensions. Siem et al. 
(2005) (see also Hoffmann et al., 2006) present a multivariate data smoothing 
method using a linear program (for the £ and £°° norms) or quadratic program 
(for the £ 2 norm). Shih et al. (2006) develop an approximation method based 
on multivariate adaptive regression splines (MARS). But none of these articles 
present convergence results. 

We propose here a simple and fast method that can be used in any dimension 
and applied to any convex function, even if not too smooth. Like Birke and 
Dette, we start by using any approximating scheme on the data, but then 
we use a convexification step, consisting in taking the convex envelope of the 
approximating function just obtained. This last step can be done very quickly 
by current software such as QHULL (Barber et al., 1996), and the uniform 
rate of convergence of the approximation technique is maintained after the 
convexification is applied. 

More precisely, we obtain uniform error estimates, and the rate of convergence 
of the convex estimator is the same as that of the original estimator, thereby 
showing that the convexification step adds basically no further errors to the 
estimating step. 

The paper is organized as follows. In Section 2 we briefly review fundamental 
smoothing techniques. In Section 3 we show theoretical results on the convexi- 
fication step, and how the error estimates for the convex estimate are derived 
from the smoothing step. Finally, in Section 4 we apply these techniques to 
approximate several problems in dimensions d — 1 and d = 2. 

2 The smoothing step: review of the literature 

As we have already pointed out, our method of convexification inherits the L°° 
rate of convergence from whichever smoothing process is chosen for the model (1). 
We think it is appropriate, then, to briefly review rates of convergence in 
norm for some of the possible choices for such a process when no monotonicity 
or convexity assumptions are made on /. 

Most of the approximation techniques with known rates of convergence are of 
the so called smoothing type, where a variable kernel is used, and we will focus 
our attention on these. 

It should be noted that since there are many different schools and people 
involved, here we can give only partial references, leaving out several meaningful 
results available in the literature. 
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Perhaps the first ones to consider these problems were Devroye (1978) and 
Schuster & Yakowitz (1979). Devroye considered the Nadaraya- Watson regression 
estimator and proved the uniform convergence (without rates) for independent 
data, with fixed or random predictors belonging to M. d , whereas Schuster and 
Yakowitz considered more general kernels in one dimension, establishing orders of 
convergence in probability. Later these results were extended by several authors, 
among them Bicrcns (1983) and Collomb (1984). They extended the result to 
non-independent data and Collomb was the first to give strong rates for uniform 
convergence. Further results on uniform convergence rates for different settings 
such as robust estimation and other kind of non-independent data were given by 
Collomb & Hardle (1986), Roussas (1990), Boente & Fraiman (1991), Truong & 
Stone (1992) and Tran (1993). Extensions to spline estimators were given by 
Eggermont & LaRiccia (2006), and to uniform choice of bandwidth by Einmahl 
& Mason (2005, 2000), Dony (2008), Dony & Einmahl (2006), Dony & Mason 
(2008), and Dony et al. (2006) (see also the references therein). 

The asymptotic distribution of the maximal deviation between a non-par- 
ametric regression estimator and the true regression was first considered by 
Johnston (1982), extending to the regression context the results by Bickel & 
Rosenblatt (1973) and Rosenblatt (1976) on density estimation. For the case d = 1 
and random predictors, Johnston showed — under some regularity assumptions — 
the L°° asymptotic distribution of the kernel regression estimator, which allowed 
him to give uniform confidence intervals for the regression estimator. This 
result was extended by Konakov & Piterbarg (1984) to other kernel estimators 
and by Hardle (1989) to general estimators defined implicitly, as for example 
M-smoothers and local polynomial estimators. As far as we know these results 
were not extended to higher dimensions or non-independent data. 

3 A convex estimator and its convergence 

Let us assume that the variables X n in the model (1) take values on a bounded 
closed convex set Q C K d , and that / £ c € , where ^ is the set of (finite real 
valued) convex functions defined on Q. 

Q need not be polyhedral, but assuming its boundary is smooth except for a 
finite set of "corners", in practice we may approximate it by a polyhedron. Thus, 
from now on, for simplicity we will assume that Q is a polyhedron, and therefore 
it is the convex hull of its finite set of vertices. In particular, we assume that Q 
is compact. 

Let us assume that /„ is an estimator of /, defined in all of Q. To fix ideas, 
we may think that f n is obtained by considering the points (Xi, Yi), i — 1, . . . , n, 
by some procedure such as smoothing. Our purpose is to derive from /„ another 
estimator which is also convex. 

To do so, we consider a finite set Ai n C Q such that the convex hull of M. n 
is Q. The number of points in M. n need not be n and the points in A4 n might 
be completely unrelated to {Xi : i £ N}. 

We now let ££ n be the set of "convex functions below /„ on A4 n ", 

JS?„ = {yj £ V : iftx) < f n {x) for all x £ M n }, 
and define the convex estimator f^, associated with the estimator /„ and the 
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set M n by 



(2) 



Since M. n contains all the vertices of Q, it is easy to see that /° is well 
defined on Q and that /° S Furthermore, /° is piecewise linear, determined 
by the maximum of hyperplancs. In particular: 

Lemma 1. S j£? n . 

As is the "lower part" of the convex hull of the set {(x, / n (x)) : x G A4„}, 
we may take advantage of any of a number of algorithms for finding convex hulls 
in K d . For instance, QHULL (Barber et al., 1996) finds the convex hull of a 
finite set of points in any number of dimensions, and is really fast for dimensions 
d < 4. 

We are led to the following procedure for constructing a convex estimator f£ 



Procedure 2. Given Xi and (i = 1, 2, . . . ): 

Step 1. (Smoothing) Construct an estimator f n of f, for instance through a 
smoothing procedure using the values Xi and Yj for i — 1, . . . ,n. 

Step 2. (Grid of points) Choose S n > and A4 n C Q so that any x € Q is the 
convex combination of points in M n whose distance to x is not more 
than 5 n . 

Step 3. (Convexification) Construct /° as in (2), for instance by using a convex 
hull procedure such as QHULL. 

In Figure 1 we represent the steps of the procedure with an example: in (a) 
we show the data and the resulting estimator /„; in (b) we show the estimator 
and its values at the points of Ai n ; in (c) we show the convex estimator /° 
obtained from the values of /„ at M. n ; and in (d) we compare the original data 
and the convex estimator obtained. 

We now show that if in the Procedure 2, f n is a good approximation of /, 
then /° is a good approximation of / provided it satisfies: 

H-l. / is a continuous convex function defined on Q, with ||/||Lip — L < oo, 



and \x — y \ denotes the (Euclidean) distance between x and y in M. d . (Recall 
that convex functions on Q are locally Lipschitz, but here we require that 
/ be uniformly Lipschitz in all of Q.) 

Theorem 3. Suppose f satisfies H-l and let f n , S n , M n and f° be as in 

Procedure 2, with 



off: 



where 



ll/lkip 



sup{|/(x) - f(y)\/\x - y\ : x,y £ Q, x ^ y}. 



SUp{|/n(x) - f(x)\ : X £ M n } < & 



(3) 



Then, 



£ n < fn(x) - f(x) < e n + L5 n for all X £ Q . 
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Proof. Since / is convex and e„ is a constant, the function f — s n is convex. 
Moreover, f(x) — e n < f n (x) for all x G M. n implies that / — e n G JSf n , and by 
the definition of f° in (2), 

f(x) -£ n < fn( x ) for a11 X G Q, 

proving one inequality. 

For the other inequality, consider x G Q, and let it € M„ and A& > 0, 
fc = 1, . . . , d+ 1, be such that 



AfcXfe = x, Afe = 1, and |x — Xk\ < <5„ for k — 1, . . . , d + 1. 



A' 

Then, 



since /^G^, 

fe 

< A fe / n (xfc) by Lemma 1, 

fe 

<J2 X k(f(x k ) + e n ) by (3), 
fe 

^ A fc /(x fc ) j + e„ since ^A fc = l. 



; 7 fe 

Now, 1 1 /| I Lip = £ and \xk — x\ < 5 n , and therefore 

f(xk)<f(x)+LS n . 
Hence, since A& > and using again that Afc = 1, we conclude 

fn(x) < (j2 A fc (f(x) + LSn)\ + e n = f(x) + L8 n + e n , 



and the result follows. □ 

Remark. In the proof we have not used the finiteness of M n , and only the values 
of /„ on M n are used. 

Noticing that given S n > we may construct a finite set M n with the property 
that any x G Q is a convex combination of points in M. n whose distance to x is 
no more than S n , we have: 

Corollary 4. If f satisfies H-l, given an estimator f n of f and S n > 0, we 

may find M n and define f° according to Procedure 2, so that 

\\f n -/Hoc < \\fn- f\\oo+L5 n . 

Remark. In the extreme case where f n — f for all n, we have \\f n — f\\oc — 0, 
but ||/^ — f\\oo > in general (for instance, if M n is finite and / is not piecewise 
linear) . 

Corollary 4 tells us that the convex estimator obtained through the 
Procedure 2 inherits the approximation properties of the original estimator /„ , 
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and the rate of convergence is preserved or even bettered provided 8 n is small 
enough. 

To illustrate this behavior, let us consider the following well-known types of 
convergence of a sequence of nonnegative random variables (R n ) n to 0, where 
(r n ) is a bounded sequence of positive numbers (possibly converging to 0), and 
we have denoted by P the underlying probability measure: 

T-l. For every e > there exists M > such that sup n P(i?„ > Mr n ) < e. 

T-2. limn^oo P(i?„ > er n ) = for every e > 0. 

T-3. Rn = 0(r n ) or R n = o(r n ) a.s. 

T-4. For every e > 0, Yln=i ¥ ( R n > er n ) < oo. 

It is easy to see that: 

Theorem 5. If any of T-l through T-4 holds for R n = \\f n — f\\oo> then it also 
holds for R n — \\f£ — f\\oc, provided f satisfies H-l and f° is constructed as in 
Corollary 4 with S n — o(r n ). 

For example, Tran (1993) shows: 

Theorem 6. For j = 1,2,..., let {(Xj, Yj)}j be a strictly stationary sequence 
of random variables, where the Xj and the Yj are W 1 -valued and M.-valued, 
respectively. Suppose f(x) = E(Y | X = x) is estimated by 
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where I n (x) = {i : 1 < i < n, \Xi — x\ < h n }, and h n w (log(n)/n) 1 ^ d+2 ' ) . 

Then, under appropriate assumptions ( including adequate regularity condi- 
tions ), 

Wfn- /IU«(Q) = 0(h n ) a.s. 

Tran's result gives a T-3 type of convergence, and therefore (by Theorem 5) 
we have that under the same assumptions, 

Wfn ~ /llz°°(Q) = 0{h n ) a.s., 

provided we take 8 n — o(h n ) in Corollary 4. 

More elaborate types of convergence include exact asymptotic behavior. A 
very simple model might be, assuming X n uniformly distributed on Q: 

T-5. There exist a sequence (d n ) n converging to 0, and a random variable R 
such that 

V(r- 1 (R n -d n ) <t)^V(R<t), 
for every t £ R at which ¥(R < t) is continuous. 

It is not possible in general to carry over this convergence from R n = 
Wfn- f \\oo directly to Rn = - /||oo, as in general ||/° - /||oo could be much 
smaller than ||/„ — /||oo, and we cannot control ||/ n — /||oo solely in terms of 
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||/n — f\\oa and || /|| Lip- Needless to say, by enlargening r„ we may transform a 
T-5 type into, say, a T-2 type of convergence. 

Besides the interest in itself, the convergence of type T-5 allows us to find 
uniform confidence bands for the regression curve, which is a practical concern. 
More precisely, if T-5 is verified, for any a, < a < 1, we may find optimal (or 
near optimal) s so that 

< s) > 1 - a. (4) 



If this inequality holds for R„ = ||/ rl — /||oo and assuming /° is constructed 
as in Corollary 4 with 5 n — o(l) for all n, then (4) is valid for R n — ||/^ — /||oo, 
albeit not with optimal s. 

In other words, Corollary 4 allows us to convert a uniform confidence band 
for /„ of the form (4) into a (slightly different) uniform confidence band for f%. 

For instance, Johnston (1982, Theorem 2.1) shows: 

Theorem 7. Let (X\, Y\), . . . , (X ni Y n ) be a random sample from a bivariate 
population, with X uniformly distributed in Q = [0, 1], and consider the following 
estimator of f{x) = K(Y \ X = x), 

1 " 

fn(x) = —J2 Y i K (( X - X >)/ h n), (5) 

nii n . 

2—1 

where h n nT s for some S, 1/5 < 5 < 1/3, and K is a piecewise smooth density 
function with support in [—A, A], A > 1. 

Then, under appropriate regularity assumptions we have 



P( (2(5 log n) 1/2 
where 



sup r n 1 {x) (f n (x) - f(x)) - d 7i 

,0<x<l 



< t -> e' 



-2cxp (-t) 



/ K 2 (u) du x E(F 2 \ X = x) 



<{x) - J - ^ 1 '- (6) 



and d n = O ((26 log n) 1 / 2 ). 

Confidence bands follow immediately (Johnston, 1982, Corollary 3.1): 

Corollary 8. Assuming Theorem 7 holds, an approximate (1 — a) x 100% 
confidence band is 

f n (x) ± r„ (d n + c(a)(28\ognY 1 ' 2 ), 

where c(a) = log2— log | log(l— a)\ (for practical applications, one would estimate 
E(Y 2 \ X = x) in (&)). 

Theorem 7 and its corollary are still valid if instead of (5), /„ is a M— smoother 
estimator defined as a solution of 







1 " 

— ^(Yi-f n )K((x-Xi)/h n ), 



nh n 



with ip a bounded monotone, antisymmetric real function (Hardlc, 1989). 
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As a final remark, let us point out that we have only used that /„ approximates 
the Lipschitz convex function /, independently of whether /„ has been obtained 
through a smoothing procedure or any other approximation method. 

4 Numerical results 

In this section we report on some practical aspects of our algorithm and present 
some simulations and examples showing its performance. 

4.1 Implementation 

We implemented our algorithm using MATLAB. The smoothing step was done 
with local polynomials of degree 1 with Gauss's kernel, and for the convexification 
we used MATLAB's functions convhull (dimension 1) and convhulln (higher 
dimensions), which are based upon the QHULL algorithm described in Barber 
ct al. (1996). 

The bandwidth was chosen using cross-validation for the local-polynomial 
fitting at the data points. In the examples shown below, once the optimal 
bandwidth was chosen, the local-polynomial fitting function was computed at 
the same data points which were set a priori as design. Whenever the data 
points were not a priori designed, the local-polynomial fit was evaluated on a 
uniform grid having approximately the same number of points. 

4.2 One dimensional simulations 

In this section we briefly illustrate the finite sample properties of the convex 
estimate of the regression function by means of a simulation study. For this 
purpose we considered the same three examples presented in Birke & Dette 
(2007), namely, 



and Q = [0, 1]. Notice that even though the third function is just Lipschitz, all 
these functions satisfy the assumption H-l. 

As in Birke & Dette (2007), we ran some simulations with n = 100 uniformly 
distributed design points for the explanatory variables and added a normal noise 
with standard deviation a = 0.1 to the response variable. 

In Figure 2 we display for each regression function five typical estimates 
obtained from different simulation runs observing a typical performance. The 
estimates for the two smooth functions fi and / 2 are comparable to the regressions 
obtained in Birke & Dette (2007), but our estimates of the nonsmooth regression 
function fa exhibit a much closer fit. This is an advantage of our method, 



A(*) 






if < x < 1/4, 
if 1/4 < x < 3/4, 
if 3/4 < x, 
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Figure 2: Regression functions fa (left), fa (middle), fa (right), and their estimates. 
Result of 5 simulations for each regression function, with sample size n = 100 and 
normal errors with a = 0.1. The estimates are very reasonable, even for fa, which is 
just Lipschitz, and not C . 



which does not approximate the derivative of the regression function, and thus it 
demands less smoothness and approximates better non differentiable functions. 

In the second part of this simulation study we investigated the mean square 
error, bias and variance of our convex estimate. For this we considered again 
the three regression functions in fa, fa, fa and computed — with 2000 simulation 
runs — the curves for the mean square error, squared bias and variance. The 
results shown in Figure 3 look very much alike those in Birke & Dette (2007), 
except for the ones related to fa, where our estimator seems to be better. In 
this figure the mean square error, bias and variance of the estimator by local 
linear polynomials are represented by the dashed lines, while those quantities 
related to our convex estimator are represented by the solid lines. 

Finally, in Figure 4 we show approximate 95% confidence bands for one 
estimate to each of the previous regression functions. We ran a simulation with 
100 uniformly distributed design points for the explanatory variables and added 
a normal noise with a = 0.1 to the response variable. In order to use the existing 
results on the width of the confidence bands from Johnston (1982, Corollary 3.1) 
(see also Theorem 7 and its corollary) , the smoothing step was done with the 
formula 

1 " 

f n {x) = —Y,K{{x - Xi)/h n ), 
j=i 

where K(x) = | (1 — x 2 )+ is Epanechnikov's Kernel. This regression formula has 
bad approximation properties at the endpoints of the interval, which explains 
the mild misfit observed there. The width of the band was 0.1392, 0.1382, 0.1628, 
for the estimate corresponding to fa, fa, and fa, respectively. 

4.3 Rabbits' data 

We studied an example considered in Dudzinski & Mykytowycz (1961), who 
analyzed the relationship between age and eye lens weight for rabbits in Australia. 
This relationship is expected to be guided by a concave function. In this study, 
the dry weight of the eye lens was measured (in milligrams) for 71 free-living wild 
rabbits of known age (measured in days). A detailed description of the experiment 
and the data can be found in http : //www. statsci . org/data/oz/rabbit . html. 
The data was analyzed by Ratkowsky (1983) using a parametric nonlinear growth 
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Var Bias 2 MSE 




Figure 3: Variance (left), squared Bias (middle) and Mean Square Error (right) of 
our convex estimate (solid line) and of the local linear estimate (dashed line). These 
indicators were obtained with 2000 simulation runs, for fi (top), f% (middle) and 
(bottom), using 100 uniformly distributed design points for the explanatory variable, 
and normal error with a = 0.1 for the observed variable. Only small differences between 
the local linear estimate and our convex estimate are observed. In some cases, our 
convex estimate is even better. 




0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 



Figure 4: Approximate 95% confidence bands. In each plot we show the exact 
regression function, the estimate, the 95% confidence bands, and the data points for 
N = 100 design points, and normal error with a = 0.1. The bands have width 0.1392, 
0.1382, and 0.1628 for /i (left), f% (middle), jz (right), which were computed with the 
formula provided in Corollary 8. 
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Figure 5: Convex regression of the rabbits' data. Dry weight of the eye lens (milligrams) 
versus age (days). Plot of the local polynomial smoothing (dashed blue), the convex 
estimate (solid red), and data points (magenta). The smoothing step is based on local 
polynomials of degree 1 (left) and degree 2 (right). The fit obtained is really excellent, 
with no essential difference between degree 1 and 2. 



model, and by Birke & Dette (2007) with their non-parametric convex regression 
method. We used our method to obtain the concave regression, with the 
smoothing step performed with local polynomials of degree 1 and 2, and report 
the findings in Figure 5. In both cases, the bandwidth for the local polynomial 
smoothing was set using cross-validation, and the result of the smoothing step 
was evaluated at a uniform grid of 100 points. The convexification step yielded 
the estimated regression curves that can be observed in Figure 5 with an excellent 
fit to the data. 

4.4 Two dimensional simulations 

In this section we briefly illustrate the finite sample properties of the convex 
estimate of a regression function in two dimensions by means of a simulation 
study. For this purpose we considered the following convex regression function: 

f(xi,x 2 ) — max {2x1 + ^2/2, 3a; 1 + x 2 } , 

which is convex, and only Lipschitz. In Figure 6 we show the level curves of 
two estimated regression functions and the exact one in two simulations. We 
took uniform grids of 10 x 10, and 20 x 20 in each situation for the explanatory 
variable, and added normal error with a = 0.1 to the value of f{x\,X2) to 
emulate an observed variable. The level curves shown in the figure show a very 
good fit, even for a coarse grid of only 20 x 20 points. 

In the second part of this simulation study we investigated the mean square 
error, bias and variance of our convex estimate. For this we considered again the 
same two dimensional regression function / and calculated by 2000 simulation 
runs the surfaces for the mean square error, squared bias and variance. The 
results depicted in Figure 7 show that the variance is concentrated on the 
boundary but is one order of magnitude smaller than the squared bias and the 
mean square error. These last two quantities are concentrated on the region of 
the domain where the regression function is not C 1 . 
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Figure 6: Level curves of two estimated regression functions (dashed red and magenta) 
and the exact one (solid blue) in two simulations with uniform design for the explanatory 
variables. One for a grid of 10 x 10 (left) points and another for a grid of 20 x 20 
(right). The fit looks very good, even for a grid of only 20 x 20 points. 




Figure 7: Variance (left), squared Bias (middle) and Mean Square Error (right) for the 
two dimensional simulation. These indicators were obtained with 2000 simulation runs, 
using 10 x 10 (left) and 20 x 20 uniformly distributed design points for the explanatory 
variable, and normal error with a = 0.1 for the observed variable. 
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Figure 8: Pareto surface obtained as the convex regression of the data points 
(left). Contour curves of the convex graph (right) showing an excelent fit of the 
data (see also Figure 2 of Hoffmann ct al., 2006). 



4.5 Radiotherapy data 

We studied a two dimensional example considered in Siem et al. (2005) (see 
also Hoffmann ct al., 2006), who approximated the Pareto surface of a multiob- 
jective optimization problem arising in the computation of the precise radiation 
dose. This Pareto surface is convex under certain conditions, and it should 
be computed from some Pareto points that can be measured from the patient. 
We obtained data from a patient of the Radboud University Nijmegen Medical 
Centre, in Nijmegen, the Netherlands. The data correspond to a multiobjective 
optimization problem with three objectives and contains 69 data points, which, 
due to measuring errors, are not convex. 

By using our method we are able to smooth the data, obtaining a convex 
Pareto surface defined as a maximum of planes. This surface is initially defined 
on the convex hull of the X data, and we have extended it to a rectangular 
domain by considering the same maximum of planes. 

In Figure 8 we show the data points together with the convex regression 
surface (left), and the contours of the convex regression (right), showing an 
excelent fit of the data (see also Figure 2 in Hoffmann ct al. , 2006) . 
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